Word Embedding Calculus in Meaningful Ultradense Subspaces
نویسندگان
چکیده
We decompose a standard embedding space into interpretable orthogonal subspaces and a “remainder” subspace. We consider four interpretable subspaces in this paper: polarity, concreteness, frequency and part-of-speech (POS) subspaces. We introduce a new calculus for subspaces that supports operations like “−1 × hate = love” and “give me a neutral word for greasy” (i.e., oleaginous). This calculus extends analogy computations like “king−man+woman = queen”. For the tasks of Antonym Classification and POS Tagging our method outperforms the state of the art. We create test sets for Morphological Analogies and for the new task of Polarity Spectrum Creation.
منابع مشابه
Supervised and unsupervised methods for learning representations of linguistic units
Word representations, also called word embeddings, are generic representations, often high-dimensional vectors. They map the discrete space of words into a continuous vector space, which allows us to handle rare or even unseen events, e.g. by considering the nearest neighbors. Many Natural Language Processing tasks can be improved by word representations if we extend the task specific training ...
متن کاملUltradense Word Embeddings by Orthogonal Transformation
Embeddings are generic representations that are useful for many NLP tasks. In this paper, we introduce DENSIFIER, a method that learns an orthogonal transformation of the embedding space that focuses the information relevant for a task in an ultradense subspace of a dimensionality that is smaller by a factor of 100 than the original space. We show that ultradense embeddings generated by DENSIFI...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملUnsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model
Deep neural network (DNN) based natural language processing models rely on a word embedding matrix to transform raw words into vectors. Recently, a deep structured semantic model (DSSM) has been proposed to project raw text to a continuously-valued vector for Web Search. In this technical report, we propose learning word embedding using DSSM. We show that the DSSM trained on large body of text ...
متن کاملPortuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks
Word embeddings have been found to provide meaningful representations for words in an efficient way; therefore, they have become common in Natural Language Processing systems. In this paper, we evaluated different word embedding models trained on a large Portuguese corpus, including both Brazilian and European variants. We trained 31 word embedding models using FastText, GloVe, Wang2Vec and Wor...
متن کامل